Comparison of RIPASA and ALVARADO scores for risk assessment of acute appendicitis: A systematic review and meta-analysis

Background In the last decades, several clinical scores have been developed and currently used to improve the diagnosis and risk management of patients with suspected acute appendicitis (AA). However, some of them exhibited different values of sensitivity and specificity. We conducted a systematic review and metanalysis of epidemiological studies, which compared RIPASA and Alvarado scores for the diagnosis of AA. Methods This systematic review was conducted using PubMed and Web of Science databases. Selected studies had to compare RIPASA and Alvarado scores on patients with suspected AA and reported diagnostic parameters. Summary estimates of sensitivity and specificity were calculated by the Hierarchical Summary Receiver Operating Curve (HSROC) using STATA 17 (STATA Corp, College Station, TX) and MetaDiSc (version 1.4) software. Results We included a total of 33 articles, reporting data from 35 studies. For the Alvarado score, the Hierarchical Summary Receiver Operating Curve (HSROC) model produced a summary sensitivity of 0.72 (95%CI = 0.66–0.77), and a summary specificity of 0.77 (95%CI = 0.70–0.82). For the RIPASA score, the HSROC model produced a summary sensitivity of 0.95 (95%CI = 0.92–0.97), and a summary specificity of 0.71 (95%CI = 0.60–0.80). Conclusion RIPASA score has higher sensitivity, but low specificity compared to Alvarado score. Since these scoring systems showed different sensitivity and specificity parameters, it is still necessary to develop novel scores for the risk assessment of patients with suspected AA.


Methods
This systematic review was conducted using PubMed and Web of Science databases. Selected studies had to compare RIPASA and Alvarado scores on patients with suspected AA and reported diagnostic parameters. Summary estimates of sensitivity and specificity were calculated by the Hierarchical Summary Receiver Operating Curve (HSROC) using STATA 17 (STATA Corp, College Station, TX) and MetaDiSc (version 1.4) software.

Conclusion
RIPASA score has higher sensitivity, but low specificity compared to Alvarado score. Since these scoring systems showed different sensitivity and specificity parameters, it is still necessary to develop novel scores for the risk assessment of patients with suspected AA. Introduction Acute appendicitis (AA) represents one of the most frequent disorders in abdominal surgery, with a prevalence ranging from 7 to 12% in the general population [1,2]. If untreated or undiagnosed, AA could lead to a higher risk of adverse outcomes, including death. Despite its common occurrence, the diagnosis of AA is still challenging for clinicians, suggesting the need of novel approaches to improve patients' management [3,4]. Indeed, clinical presentation of AA is commonly atypical and easily mistaken for other conditions, with only about 40% of the cases presenting typical signs and symptoms (i.e., periumbilical pain, nausea, vomiting, pain migration to the right lower quadrant) [5][6][7]. In the last decades, several scoring systems have been developed to assist clinicians in the assessment of patients with suspected appendicitis [8,9]. Among these, the ALVARADO score -proposed for the first time in 1986-is one of the most widely used in the diagnosis of AA based on 6 clinical parameters and 2 laboratory measurements (i.e., localized tenderness in the right lower quadrant, migration of pain, temperature elevation, nausea-vomiting, anorexia, rebound pain, leukocytosis and leukocyte shift to the left) [8]. Despite not being specific enough, a score of 4-5 is compatible with the diagnosis of AA, a score of 7-8 indicates a probable appendicitis, and a score of 9-10 indicates a very probable AA [10,11]. However, the Alvarado score is also considered lacking some parameters, including age, gender, and duration of symptoms, which have shown to be crucial in the diagnosis of AA [3,12]. The RIPASA is one of the most recently developed scoring systems, which is based on six additional clinical and personal patients' parameters than those included in the Alvarado score (i.e., age, gender, duration of symptoms, guarding, Rovsing's sign, and negative urinalysis).
In this case, a RIPASA score of more than 7.5 is considered positive for appendicitis [1,8,11,[13][14][15]. Although RIPASA and Alvarado scores are the most commonly used in clinical practice, no clear indication exists for choosing what scoring system might be more suitable for patients at risk of AA [16]. Here, we conducted a systematic review and metanalysis of epidemiological studies comparing RIPASA and Alvarado scores, in order to identify which is the one providing more accurate diagnosis of AA.

Literature search and selection criteria
The current systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statements and the Cochrane Handbook's guidelines (PRISMA checklist available in S1 Appendix) [17]. The research protocol was registered in the PROSPERO database, with the code CRD42022339490. Two authors (GB and AV) conducted a literature search of articles, using the databases PubMed and Web of Science. The electronic search strategy included the following keywords: ((Appendicitis) AND (RIPASA) AND (Alvarado)). The last search was conducted on 21 July 2021. After identifying and removing duplicates, the authors also conducted a cross-search through the articles cited by the studies, aiming to identify additional articles to be included in the systematic review. Selected studies had to meet the following inclusion criteria: (i) observational studies; (ii) which provided full-text and written in English language; (iii) which included patients with suspected acute appendicitis (iv) and compared RIPASA and Alvarado scores. By contrast, the following articles were excluded: (i) experimental studies; (ii) studies conducted only on a specific population (e.g. pregnant women or pediatric patients); (iii) studies not comparing the mentioned scoring systems; (iv) studies conducted on patients with an already established cause of abdominal pain and/or patients who experienced pain for a prolonged period; (v) letters, comments, case reports, case series, reviews.
Titles and abstracts of all identified articles were independently screened by two authors (GB and AV). Articles potentially eligible were full-text reviewed to assess whether eligibility criteria were fully met. Discordant opinions between investigators were resolved by consulting a third author (AA).

Data extraction
The following information was extracted from all included studies: first author, year of publication, study design, sample size, age, sex, histologically confirmed acute appendicitis, other previous diagnoses, computerized tomography (CT) performed. In addition, for both the RIPASA and Alvarado scores, the authors collected the following information: specificity, sensitivity, positive predictive value, negative predictive value, diagnostic accuracy, negative appendicectomy rate, area under the roc-curve, positive likelihood ratio, negative likelihood ratio. Discordant opinions between investigators were resolved by consulting a third author (AA).

Definitions of RIPASA and ALVARADO scores
Clinical Scoring Systems are useful to group patients according to their symptoms and signs, and to identify patients with suspected appendicitis. Alvarado clinical score includes 6 clinical parameters and 2 laboratory measurements, which are relevant in the diagnosis of acute appendicitis. Among these, migration of abdominal pain to the right iliac fossa, anorexia or ketones in the urine, nausea or vomiting, localized tenderness in the right iliac fossa, rebound pain, body temperature more than 37.3˚C, leukocytosis, and neutrophilia. Alvarado score indicates a confirmed, probable, or very probable diagnosis of acute appendicitis, in the case of a score of 4-6, 7-8, or 9-10, respectively. Commonly, a score of 7.0 is considered as positive for appendicitis [10,11].
RIPASA clinical score includes the following parameters: age, gender, right iliac fossa pain, migration of pain to the right iliac fossa, nausea or vomiting, anorexia, duration of symptoms, localized tenderness in the right iliac fossa, guarding, rebound tenderness, Rovsing' s sign, fever, raised white cell count, negative urinalysis, and foreign national registration identity card. Commonly, a score above 7.5 is considered as positive for the diagnosis of appendicitis [1,8,11,[13][14][15].

Risk of bias and quality assessment
The methodological quality of the included studies was assessed using a set of criteria for the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2). By considering 4 domains (i.e., patient selection, index test, reference standard, and flow and timing), this approach is useful for the evaluation of diagnostic accuracy studies. In particular, the questions can be answered using "low", "high" or "unclear" to judge the risk of bias [18].

Statistical analysis
Meta-analysis of diagnostic test accuracy requires a statistically rigorous approach based on hierarchical models that respect the binomial data structure. In the present study, we first obtained for each score the forest plots of sensitivity and specificity and their 95% Confidence Intervals (CI) based on a random-effects model and using the MetaDiSc software (version 1.4). The heterogeneity was assessed with the I 2 statistic. Next, the summary estimates of sensitivity and specificity were calculated by the Hierarchical Summary Receiver Operating Curve (HSROC), using the package Metandi for STATA 17 statistical software (STATA Corp, College Station, TX). To visualize the HSROC curve, we also used the command metandiplot.  Table 1 shows the main characteristics (i.e., country, type of study, sample size) of the included studies, as well as characteristics of patients (i.e., age, sex). Table 2, instead, summarizes statistical parameters of RIPASA and ALVA-RADO scores, respectively.

Main characteristics of included studies
All the included studies were published between 2011 and 2020. In particular, most of the studies were conducted in South-Eastern countries, of which 14 in India, 5 in Turkey, 2 in Pakistan, 2 in Egypt, 2 in Iran, 1 in Jordan, 1 in China, 1 in Korea, 1 in Brunei, 2 in Mexico, 1 in USA and 1 in Poland. With respect to the study design, all the 33 articles included in the study were observational studies. Specifically, 26 were prospective, 4 retrospective, and 3 cross-sectional. The overall sample size ranged from 56 to 600 participants. Although gender distribution throughout the studies was fairly balanced, almost all studies reported a higher proportion of men. The most commonly considered symptom to identify patients with AA was the pain in Right Iliac Fossa. Moreover, some studies required more extensive list of clinical symptoms, as well as advanced imaging techniques.

Cut-offs of scoring systems
In the various studies, diagnostic parameters for RIPASA and Alvarado scores were calculated according to different cut-offs. Most of the studies used 7.0 and 7.5 as conventional cut-offs for Alvarado and RIPASA scores, respectively. Accordingly, patients were considered as affected by AA if their scores exceeded these cut-off values. However, Korkut et al. and Ozdemir et al. used the value of 8 for the Alvarado, and the values of 10 or 12 for the RIPASA, respectively. Reasons of using different cut-offs may be explained by the aim to improve the diagnostic parameters of the scores. For all the studies considered, the gold standard is given by the histopathological exam performed post-surgery.

Scoring systems performances
Overall, the present systematic review included 5384 patients with AA who were tested with the RIPASA and Alvarado scores. The sensitivity values ranged from 16.4% to 100% for the RIPASA score, and from 14.8% to 97.2% for the Alvarado score (Fig 2). Interestingly, all studies reported a higher sensitivity for the RIPASA score than for the Alvarado score. Most of the studies reported higher values of specificity for the Alvarado score than for the RIPASA score. The specificity values ranged from 9% to 100% for the RIPASA score, and from 16% to 100% for the Alvarado score (Fig 3). The majority of studies reported higher Positive Predictive Value for the Alvarado score. Conversely, the majority of studies reported higher Negative Predictive Values for the RIPASA score. Moreover, in the studies included in the present  Fig 4 shows hierarchical summary estimates of sensitivity and specificity for the Alvarado and the RIPASA scores, respectively. The graphs also report a 95% prediction ellipse for the individual values of sensitivity and specificity, and the 95% confidence ellipse around the mean values of sensitivity and specificity. For the Alvarado score (Fig 4A), the HSROC model produced a summary sensitivity of 0.72 (95%CI = 0.66-0.77), and a summary specificity of 0.77 (95%CI = 0.70-0.82). The heterogeneity was I 2 = 0.90 for the sensitivity and I 2 = 0.59 for the specificity. For the RIPASA score (Fig 4B), the HSROC model produced a summary sensitivity of 0.95 (95%CI = 0.92-0.97), and a summary specificity of 0.71 (95%CI = 0.60-0.80). The heterogeneity was I 2 = 0.76 for the sensitivity and I 2 = 0.70 for the specificity.

Quality assessment
The details of the quality assessment are reported in S2 Appendix. In general, the risk of bias was unclear or high for all domains under investigation (i.e., patient selection, index test, reference standard, and flow and timing). Similarly, we noted unclear or high concerns of applicability for all studies.

Discussion
AA is one of the most common causes of acute abdominal pain, posing a serious diagnostic challenge for general surgeons due to its clinical variability and high prevalence [3]. Although a wide range of diagnostic tests hold great promise in clinical practice, early identifying an abnormal appendicitis is still challenging both for avoiding unnecessary surgical intervention and reducing healthcare costs [19,20]. Moreover, complications related to the inflammation of the appendix further complicate patient's prognosis, also suggesting the need of implementing prediction scoring systems [20]. In this scenario, the use of clinical scoring systems can help healthcare providers in improving decision-making, patients' management, and identification of suspected appendicitis [3]. Moreover, several lines of evidence suggest that the integrated use of clinical scoring systems and diagnostics images allow to correctly identify patients with AA [3,8]. Among the most common scores, RIPASA and Alvarado constitute the most utilized to clinically diagnose appendicitis in suspected patients [21]. In this study, we carried out a systematic review and meta-analysis of epidemiological studies comparing these two scores in terms of sensitivity and specificity. In line with previous evidence, our results reveal that the RIPASA score has higher sensitivity but lower specificity than the Alvarado

PLOS ONE
score. It means that the RIPASA score has a higher ability in predicting patients with AA, but also giving a high proportion of false positives. Thus, these findings should be considered when choosing the most appropriate test for the clinical practice. On the one hand, the high diagnostic performance of the RIPASA score could reduce the morbidity and mortality of patients with AA. On the other hand, however, the high number of false positives could lead to an increase in inappropriate procedures and healthcare costs. To our knowledge, the strength of our work was represented by the lack of systematic reviews and meta-analyses in medical literature published on the same topic. Moreover, our study considered two scoring systems that have the advantages of being easy to use for clinicians, also requiring low healthcare costs to be applied. However, our study had some limitations to be considered. Firstly, most studies included in the present meta-analysis considered different cut-off values for the RIPASA and Alvarado scoring systems. Therefore, this could be considered a potential source of bias, also increasing the heterogeneity between studies. In fact, our analysis detected significant heterogeneity for both sensitivity and specificity. The quality assessment also reported an unclear-high risk of bias associated to patient selection, index test, reference standard, flow, and timing. Another source of misinterpretation is the possible existence of publication bias, which occurs when some studies have a higher probability to be published than others. However, there are no currently adequate methods to detect publication bias in meta-analyses of diagnostic tests, not allowing to completely exclude the presence of this kind of bias. Secondly, these scoring systems are mainly based on patient' clinical parameters measured in emergency situations and critical environments, which in turn could lead to wrong diagnoses and scoring systems calculation. Moreover, using these two scores could make difficult the diagnosis of AA for specific subgroups of patients, including those with older age, diabetes mellitus and pediatric patients. Thirdly, most of the studies included in the present meta-analysis did not compare RIPASA and Alvarado scores with other diagnostic tests used in clinical practice. With these considerations in mind, the present systematic review and meta-analysis points out benefits and drawbacks of the two widely used scoring systems for the diagnosis of AA. Specifically, we found that the RIPASA scoring system can be useful both for excluding the diagnosis of AA and for relaying intermediate-risk patients to more accurate diagnostic imaging techniques. However, it is not currently possible to define a universal diagnostic test to be used in the clinical practice. The choice depends on several factors, including the resource to obtain data and different clinical settings. In this scenario, our findings could guide future studies to improve the current knowledge about the risk assessment of patients with AA, also promoting the implementation of existing scores and/or the development of innovative tools for clinical practice.

Conclusions
In conclusion, the early diagnosis of patients with suspected AA is still a challenge for clinical practitioners and public health professionals. Although the existing scoring systems help in the risk assessment and in the prediction of clinical deterioration, these scores show variable values of specificity and sensitivity. In our study, the RIPASA score had a superior performance in identifying true positive patients, while the Alvarado score was better in predicting true negative patients. For this reason, further research should be encouraged to develop novel scores and strategies for improving the risk assessment of patients with suspected AA.